Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade DDK and Resolve Data.all Pipelines #866

Merged
merged 13 commits into from
Nov 28, 2023

Conversation

noah-paige
Copy link
Contributor

@noah-paige noah-paige commented Nov 10, 2023

Feature or Bugfix

  • Enhancement

Detail

  • Upgrade DDK Core library to v1.3.0
  • Remove calls to DDK CLI Library (deprecated)
  • Edit blueprints/ to work with CDK-native App and use DDK Configurator

Also as part of this PR:

  • Allowed the ability to update CDK Pipeline stacks (previously not supported)

Relates

Security

Please answer the questions below briefly where applicable, or write N/A. Based on
OWASP 10.

  • Does this PR introduce or modify any input fields or queries - this includes
    fetching data from storage outside the application (e.g. a database, an S3 bucket)?
    • Is the input sanitized?
    • What precautions are you taking before deserializing the data you consume?
    • Is injection prevented by parametrizing queries?
    • Have you ensured no eval or similar functions are used?
  • Does this PR introduce any functionality or component that requires authorization?
    • How have you ensured it respects the existing AuthN/AuthZ mechanisms?
    • Are you logging failed auth attempts?
  • Are you using or adding any cryptographic features?
    • Do you use a standard proven implementations?
    • Are the used keys controlled by the customer? Where are they stored?
  • Are you introducing any new policies/roles/users?
    • Have you used the least-privilege principle? How?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@noah-paige
Copy link
Contributor Author

noah-paige commented Nov 10, 2023

Testing in AWS Account:

  • Same Account CodePipeline Trunk

  • Same Account CodePipeline GitFlow

  • Same Account CDK Pipeline

  • Cross Account CodePipeline Trunk

  • Cross Account CodePipeline GitFlow

  • Cross Account CDK Pipeline

  • CDK Pipeline 3 Envs (CICD + Dev in same, Prod in cross)

  • Trunk-based CodePipeline 3 Envs (CICD + Dev in same, Prod in cross)

Also verified Pipeline Stack(s) tagged with the same key-value pairs as before

@noah-paige noah-paige marked this pull request as ready for review November 13, 2023 20:00
@noah-paige
Copy link
Contributor Author

2 thoughts to enhance this PR:

  1. Currently, the bootstrapping required for multi-account pipelines can be a little tricky - you have to edit the original bootstrap command with an additional --trust to the CICD AWS Account to get pipelines working without interfering with previous data.all Environment set-up.
  • We could look into creating a new cdk bootstrap stack specifying a new toolkit name and new qualifier or prefix, similar to how DDK CLI used to do in v0.5.1 (ddk constructs support this by adding bootstrap specific parameters in ddk.json ddk docs)
  • But this will require additional manual work for users to run these bootstrap commands for their pipeline environments before creating pipelines - and may be prone to error
  1. Should we move dataall/backend/dataall/modules/datapipelines/blueprints/ to under dataall/backend/dataall/modules/datapipelines/cdk/ directory to keep module file structure standardized across all modules

@noah-paige noah-paige requested a review from dlpzx November 15, 2023 15:19
@dlpzx
Copy link
Contributor

dlpzx commented Nov 24, 2023

2 thoughts to enhance this PR:

1. Currently, the bootstrapping required for multi-account pipelines can be a little tricky - you have to edit the original bootstrap command with an additional `--trust` to the CICD AWS Account to get pipelines working without interfering with previous data.all Environment set-up.


* We could look into creating a new cdk bootstrap stack specifying a new toolkit name and new qualifier or prefix, similar to how DDK CLI used to do in v0.5.1 (`ddk` constructs support this by adding `bootstrap` specific parameters in `ddk.json` [ddk docs](https://awslabs.github.io/aws-ddk/release/latest/how-to/custom-bootstrap.html))

* But this will require additional manual work for users to run these bootstrap commands for their pipeline environments before creating pipelines - and may be prone to error


2. Should we move `dataall/backend/dataall/modules/datapipelines/blueprints/` to under `dataall/backend/dataall/modules/datapipelines/cdk/` directory to keep module file structure standardized across all modules
  1. Maybe it is time to collect some feedback from customers and DDK experts on their recommendations. Personally I think it is better to separate the CDKToolkit stacks. That way we can 1. limit the permissions for each stack and 2. avoid overwriting trust relationships
  2. Definitely, if possible

@@ -54,11 +54,11 @@ def get_statements(self):
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have always had my doubts about this cloudformation policy. At the moment it is needed because DDK does not enforce a naming convention on the stacks. Do you think we should work on restricting it?

@dlpzx
Copy link
Contributor

dlpzx commented Nov 24, 2023

I have reviewed the code and overall it looks good. The next steps can go in the direction of reading the blueprint from a separate repository, but that's something that we will evaluate as a whole when reviewing pipelines

@noah-paige
Copy link
Contributor Author

noah-paige commented Nov 24, 2023

I moved blueprints/ to under the cdk/ parent directory and re-tested create, update, and delete with the latest changes

On bootstrapping requirements, definitely think it is something we should discuss and see how customers are currently using pipelines/ what would be best from DDK experts in this space

As is, I think this PR is good for final review unless we have a clear direction on how we want to proceed on bootstrapping pattern for pipelines (currently everything works as is so I am viewing this as an "enhancement" that could be handled separately if needed, but let me know your thoughts)

@dlpzx
Copy link
Contributor

dlpzx commented Nov 27, 2023

I agree on leaving bootstraping out of this PR as an enhancement to not overcomplicate this review.
I am testing the feature as is now in AWS

  • CICD successfully deployed

Using same account for CICD and DEV. The following pipelines are created and executed successfully

  • CDK-trunk
  • CodePipeline trunk
  • CodePipeline gitflow

Using one account for CICD and DEV and another for PROD. The following pipelines are created and executed successfully

  • CDK-trunk
  • CodePipeline trunk
  • CodePipeline gitflow

Comments
As enhancement, we could change a bit the names/descriptions of the CloudFormation stacks.
image

  • For the CDK pipeline there is no way to identify the pipeline stack other than the pipelineUri in the name. I think we can directly add a description field to the DDK CICDPipelineStack construct (Docs) in dataall/modules/datapipelines/cdk/datapipelines_cdk_pipeline.py
  • For the data pipeline defined in dataall/modules/datapipelines/cdk/blueprints/data_pipeline_blueprint/app.py we could add the environment to the stack name just in case DEV and TEST are deployed in the same account f"{pipeline_name}-{environment_id}-DataallPipelineStack",

@noah-paige
Copy link
Contributor Author

@dlpzx - added the enhancements to name/description of CFN resources:

  1. CDKPipelines CFN Description - Added Stack Description with same formatting as other CFN Stacks created by data.all, Cloud formation stack of PIPELINE: {Pipeline Label}; URI: {Pipeline URI}; DESCRIPTION: {Pipeline Description}

  2. DataPipeline Stacks - Added environment_id to the naming convention of stacks created via CodePipeline patterns (i.e. "{pipeline_name}-{environment_id}-DataallPipelineStack")

Tested again creating new pipelines after the above changes and tested for a trunk-based CodePipeline with 2 pipeline envs in the same account (i.e. dev and test both in 1 account)

Think ready for another round of review

@dlpzx
Copy link
Contributor

dlpzx commented Nov 28, 2023

Hi @noah-paige, I am doing a final review in AWS

  • CICD successfully deployed

Using one account for CICD and DEV and another for PROD. The following pipelines are created and executed successfully

  • CDK-trunk
  • CodePipeline trunk
  • CodePipeline gitflow - It fail once (see screenshot below), but when retrying it worked fine

Copy link
Contributor

@dlpzx dlpzx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran into an issue on the gitflow pipeline deployment. The pipeline stack fails with the following error message
image

This error only appeared once and when checking the template in CloudFormation I could not find any errors, the bucket was defined in the template and correctly referenced by the policy

@dlpzx dlpzx self-requested a review November 28, 2023 07:46
@noah-paige noah-paige merged commit 8ca73e6 into main Nov 28, 2023
9 checks passed
@noah-paige noah-paige linked an issue Nov 28, 2023 that may be closed by this pull request
@dlpzx dlpzx deleted the upgrade-ddk-dataall-pipelines branch January 10, 2024 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upgrade data.all pipelines DDK version to DDK v2
2 participants